Reduced complexity dynamic programming based on policy iteration

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Complexity of Policy Iteration

Decision-making problems in uncertain or stochastic domains are often formulated as Markov decision processes (MD Ps). Pol­ icy iteration (PI) is a popular algorithm for searching over policy-space, the size of which is exponential in the number of states. We are interested in bounds on the complexity of PI that do not depend on the value of the discount factor. In this paper we prove the first...

متن کامل

Unifying Value Iteration, Advantage Learning, and Dynamic Policy Programming

Approximate dynamic programming algorithms, such as approximate value iteration, have been successfully applied to many complex reinforcement learning tasks, and a better approximate dynamic programming algorithm is expected to further extend the applicability of reinforcement learning to various tasks. In this paper we propose a new, robust dynamic programming algorithm that unifies value iter...

متن کامل

An Efficient Policy Iteration Algorithm for Dynamic Programming Equations

We present an accelerated algorithm for the solution of static Hamilton-JacobiBellman equations related to optimal control problems. Our scheme is based on a classic policy iteration procedure, which is known to have superlinear convergence in many relevant cases provided the initial guess is sufficiently close to the solution. This limitation often degenerates into a behavior similar to a valu...

متن کامل

iteration-complexity for cone programming

In this paper we consider the general cone programming problem, and propose primal-dual convex (smooth and/or nonsmooth) minimization reformulations for it. We then discuss first-order methods suitable for solving these reformulations, namely, Nesterov’s optimal method (Nesterov in Doklady AN SSSR 269:543–547, 1983; Math Program 103:127–152, 2005), Nesterov’s smooth approximation scheme (Nester...

متن کامل

Temporal Differences - Based Policy Iteration and Applications in Neuro - Dynamic Programming

We introduce a new policy iteration method for dynamic programming problems with discounted and undiscounted cost. The method is based on the notion of temporal differences, and is primarily geared to the case of large and complex problems where the use of approximations is essential. We develop the theory of the method without approximation, we describe how to embed it within a neuro-dynamic p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Mathematical Analysis and Applications

سال: 1992

ISSN: 0022-247X

DOI: 10.1016/0022-247x(92)90007-z